3574 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
80,281 tokens Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Towards Open Domain Event Trigger Identification using Adversarial Domain Adaptation
-
Paper track:Short/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aakanksha Naik | TimeBank 1.2 | /N |
Documentation:
https://catalog.ldc.upenn.edu/docs/LDC2006T08/timeml_annguide_1.2.1.pdf
Speech/Written
,
Language Type:
Monolingual
Languages:
English
Availability:
Restricted data from a clinic
License:
Size:
None Production Status:
Existing-used
Use:
Topic Detection and Tracking
-
Paper title:Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations
-
Paper track:Short/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Karan Singla | MISC coded Pyschotherapy data | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Universal Dependencies | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
From NIST
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Reuters RCV1/RCV2 Multilingual Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Apache 2.0
Size:
7148 KByte Production Status:
Newly created-finished
Use:
Information Extraction, Information Retrieval
-
Paper title:{S}ci{REX}: {A} Challenge Dataset for Document-Level Information Extraction
-
Paper track:Long/Information Extraction
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sarthak Jain | SciREX | /N |
Documentation:
None
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
License:
Size:
None Production Status:
Existing-used
Use:
-
Paper title:Speakers enhance contextually confusable words
-
Paper track:Long/Cognitive Modeling and Psycholinguistics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Leon Bergen | The Buckeye corpus | /N |
Documentation:
None
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
License:
Size:
None Production Status:
Existing-used
Use:
-
Paper title:Speakers enhance contextually confusable words
-
Paper track:Long/Cognitive Modeling and Psycholinguistics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Leon Bergen | NXT Switchboard | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
1.1 GByte Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:Fact-based Content Weighting for Evaluating Abstractive Summarisation
-
Paper track:Short/Summarization
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Xinnuo Xu | XSum | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
Chinese English
Availability:
Freely Available
License:
BSD
Size:
6 GByte Production Status:
Existing-used
Use:
Summarisation
-
Paper title:Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization
-
Paper track:Long/Summarization
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Junnan Zhu | NCLS-corpora | /N |
Documentation:
English documentation available at https://github.com/ZNLP/NCLS-Corpora
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CC-BY-4.0
Size:
None Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing
-
Paper track:Long/Theme
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alane Suhr | Advising | /N |
Documentation:
https://github.com/jkkummerfeld/text2sql-data




